{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notes on computations in ForwardHG class"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let $s=(s^i)_{i=1}^C$ be the state vector divided in its components (e.g. vaious weights of a neural network, accumulated gradients, ...). Assume that each component $s^i\\in\\mathbb{R}^{d_i}$ is a (column) vector as well as $s\\in\\mathbb{R}^d$, with $d=\\sum_i d_i$. We should compute the update of the total derivative of $s_t$ w.r.t. a single scalar hyperparameter $\\lambda$. $s_t$ is the $t-$th iterate of the mapping $\\Phi$:\n",
    "$$\n",
    "  s_0 = \\Phi_0(\\lambda) \\qquad\n",
    "    s_t =  \\Phi_t(s_{t-1},\\lambda),\\quad t \\in \\{1,\\dots,T\\}\n",
    "$$.\n",
    "Let $\\Phi^i$ denote the components of the iterative mapping relative to $i$-th state vector. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calling\n",
    "$$\n",
    "A_t = \\partial_s \\Phi_t(s_{t-1},\\lambda) \\in \\mathbb{R}^{d\\times d}  \\qquad B_t = \\partial_{\\lambda} \\Phi_t(s_{t-1},\\lambda)\\in \\mathbb{R}^d, \n",
    "$$\n",
    "the update on the variable $Z=\\frac{\\mathrm{d} s}{\\mathrm{d} \\lambda}$ will be \n",
    "\\begin{equation}\n",
    "Z_t = A_t Z_{t-1} + B_t.  \\end{equation}\n",
    "Let $Z=(Z^i)_{i=1}^C$ be also divided in its component, where each $Z^i$ has the same dimensionality of $s^i$ (sice the hyperparameter is a scalar). Componentwise, the above update becomes\n",
    "$$\n",
    "Z^i_t = \\partial_s \\Phi^i_{t}(s_{t-1}, \\lambda)^T Z_{t-1} + \\partial_{\\lambda} \\Phi_t^i(s_{t-1}, \\lambda) = \n",
    "\\sum_{j=1}^C \\partial_{s^j} \\Phi^i_t(s_{t-1}, \\lambda)^T Z^j_{t-1}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### How to compute the $Z$-update using only scalar gradients"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Introduce the auxiliary variable $v=(v_i)_{i=1}^C$ divided in the same way as the state $s$. \n",
    "Then, dropping the iteration index for simplicity, $B$ can be computed as\n",
    "$$\n",
    "B = \\partial_v \\left[\\sum_{i=1}^C \\partial_{\\lambda} ( \\Phi^i(s, \\lambda)^T v_i)  \\right]\n",
    "$$\n",
    "by noting that $\\Phi^i(s, \\lambda)^T v_i$ is a scalar quantity as well as $\\partial_{\\lambda} ( \\Phi^i(s, \\lambda)^T v_i)$. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the computation of $A_t Z_{t-1}$, instead, (dropping the iteration index) define\n",
    "$$\n",
    "\\Psi_i = \\partial_{s^i} \\left[\\sum_{j=1}^C  \\Phi^j(s, \\lambda)^T v_j  \\right].\n",
    "$$\n",
    "Then\n",
    "$$\n",
    "\\partial_s \\Phi^i(s, \\lambda)^T Z = \\partial_{v_i} \\left[\\sum_{j=1}^C \\Psi_j^TZ^j \\right],\n",
    "$$\n",
    "which proves to be the right quantity by excanging the order of summations and derivatives, and makes use only of gradients of scalar functions. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On a practical note... the variables $v_i$ may take any value since they do not appear in the final computation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [conda root]",
   "language": "python",
   "name": "conda-root-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
